Fast Two-Sample Testing with Analytic Representations of Probability Measures
نویسندگان
چکیده
We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Analyticity implies that differences in the distributions may be detected almost surely at a finite number of randomly chosen locations/frequencies. The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distancebased tests. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate that our tests give a better power/time tradeoff than competing approaches, and in some cases, better outright power than even the most expensive quadratic-time tests. This performance advantage is retained even in high dimensions, and in cases where the difference in distributions is not observable with low order statistics.
منابع مشابه
Credit risk analysis of cashow CDO structures
We develop a method that o¤ers consistent and computationally e¢ cient credit risk analysis of cashow CDO structures. The proposal makes use of simple portfolio models that admit semi-analytic representations of the loss distribution, combined with detailed and fast calculations of realistic interest and principal cashow waterfalls. We de ne in this context and study credit tranche risk measu...
متن کاملThe Effect of Length of Pre-task Planning Time on Discourse-analytic Measures and Analytic Ratings in L2 Written Narratives
The favorable gains gleaned from the provision of pre-task planning time (PTP) have struck a chord with SLA researchers as they try to manipulate task features to promote language production and development. In a similar vein, the present study is a two-fold attempt to first compare the effect of the length of pre-task planning time on discourse-analytic measures in narrative written production...
متن کاملGlobally analytic $p$-adic representations of the pro--$p$--Iwahori subgroup of $GL(2)$ and base change, I : Iwasawa algebras and a base change map
This paper extends to the pro-$p$ Iwahori subgroup of $GL(2)$ over an unramified finite extension of $mathbb{Q}_p$ the presentation of the Iwasawa algebra obtained earlier by the author for the congruence subgroup of level one of $SL(2, mathbb{Z}_p)$. It then describes a natural base change map between the Iwasawa algebras or more correctly, as it turns out, between the global distribut...
متن کاملMeasuring Hospital Performance Using Mortality Rates: An Alternative to the RAMR
Background The risk-adjusted mortality rate (RAMR) is used widely by healthcare agencies to evaluate hospital performance. The RAMR is insensitive to case volume and requires a confidence interval for proper interpretation, which results in a hypothesis testing framework. Unfamiliarity with hypothesis testing can lead to erroneous interpretations by the public and other stakeholders. We argue t...
متن کاملRepresentations of Double Coset Lie Hypergroups
We study the double cosets of a Lie group by a compact Lie subgroup. We show that a Weil formula holds for double coset Lie hypergroups and show that certain representations of the Lie group lift to representations of the double coset Lie hypergroup. We characterize smooth (analytic) vectors of these lifted representations.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015